798 research outputs found
Model and Evaluation: Towards Fairness in Multilingual Text Classification
Recently, more and more research has focused on addressing bias in text
classification models. However, existing research mainly focuses on the
fairness of monolingual text classification models, and research on fairness
for multilingual text classification is still very limited. In this paper, we
focus on the task of multilingual text classification and propose a debiasing
framework for multilingual text classification based on contrastive learning.
Our proposed method does not rely on any external language resources and can be
extended to any other languages. The model contains four modules: multilingual
text representation module, language fusion module, text debiasing module, and
text classification module. The multilingual text representation module uses a
multilingual pre-trained language model to represent the text, the language
fusion module makes the semantic spaces of different languages tend to be
consistent through contrastive learning, and the text debiasing module uses
contrastive learning to make the model unable to identify sensitive attributes'
information. The text classification module completes the basic tasks of
multilingual text classification. In addition, the existing research on the
fairness of multilingual text classification is relatively simple in the
evaluation mode. The evaluation method of fairness is the same as the
monolingual equality difference evaluation method, that is, the evaluation is
performed on a single language. We propose a multi-dimensional fairness
evaluation framework for multilingual text classification, which evaluates the
model's monolingual equality difference, multilingual equality difference,
multilingual equality performance difference, and destructiveness of the
fairness strategy. We hope that our work can provide a more general debiasing
method and a more comprehensive evaluation framework for multilingual text
fairness tasks
An Effective Deployment of Contrastive Learning in Multi-label Text Classification
The effectiveness of contrastive learning technology in natural language
processing tasks is yet to be explored and analyzed. How to construct positive
and negative samples correctly and reasonably is the core challenge of
contrastive learning. It is even harder to discover contrastive objects in
multi-label text classification tasks. There are very few contrastive losses
proposed previously. In this paper, we investigate the problem from a different
angle by proposing five novel contrastive losses for multi-label text
classification tasks. These are Strict Contrastive Loss (SCL), Intra-label
Contrastive Loss (ICL), Jaccard Similarity Contrastive Loss (JSCL), Jaccard
Similarity Probability Contrastive Loss (JSPCL), and Stepwise Label Contrastive
Loss (SLCL). We explore the effectiveness of contrastive learning for
multi-label text classification tasks by the employment of these novel losses
and provide a set of baseline models for deploying contrastive learning
techniques on specific tasks. We further perform an interpretable analysis of
our approach to show how different components of contrastive learning losses
play their roles. The experimental results show that our proposed contrastive
losses can bring improvement to multi-label text classification tasks. Our work
also explores how contrastive learning should be adapted for multi-label text
classification tasks.Comment: Accepted by ACL-Findings 2023, 13 page
CL-XABSA: Contrastive Learning for Cross-lingual Aspect-based Sentiment Analysis
As an extensive research in the field of Natural language processing (NLP),
aspect-based sentiment analysis (ABSA) is the task of predicting the sentiment
expressed in a text relative to the corresponding aspect. Unfortunately, most
languages lack of sufficient annotation resources, thus more and more recent
researchers focus on cross-lingual aspect-based sentiment analysis (XABSA).
However, most recent researches only concentrate on cross-lingual data
alignment instead of model alignment. To this end, we propose a novel
framework, CL-XABSA: Contrastive Learning for Cross-lingual Aspect-Based
Sentiment Analysis. Specifically, we design two contrastive strategies, token
level contrastive learning of token embeddings (TL-CTE) and sentiment level
contrastive learning of token embeddings (SL-CTE), to regularize the semantic
space of source and target language to be more uniform. Since our framework can
receive datasets in multiple languages during training, our framework can be
adapted not only for XABSA task, but also for multilingual aspect-based
sentiment analysis (MABSA). To further improve the performance of our model, we
perform knowledge distillation technology leveraging data from unlabeled target
language. In the distillation XABSA task, we further explore the comparative
effectiveness of different data (source dataset, translated dataset, and
code-switched dataset). The results demonstrate that the proposed method has a
certain improvement in the three tasks of XABSA, distillation XABSA and MABSA.
For reproducibility, our code for this paper is available at
https://github.com/GKLMIP/CL-XABSA
An interpretability framework for Similar case matching
Similar Case Matching (SCM) plays a pivotal role in the legal system by
facilitating the efficient identification of similar cases for legal
professionals. While previous research has primarily concentrated on enhancing
the performance of SCM models, the aspect of interpretability has been
neglected. To bridge the gap, this study proposes an integrated pipeline
framework for interpretable SCM. The framework comprises four modules: judicial
feature sentence identification, case matching, feature sentence alignment, and
conflict resolution. In contrast to current SCM methods, our framework first
extracts feature sentences within a legal case that contain essential
information. Then it conducts case matching based on these extracted features.
Subsequently, our framework aligns the corresponding sentences in two legal
cases to provide evidence of similarity. In instances where the results of case
matching and feature sentence alignment exhibit conflicts, the conflict
resolution module resolves these inconsistencies. The experimental results show
the effectiveness of our proposed framework, establishing a new benchmark for
interpretable SCM
Common and different features of Chinese and Italian hydrogeological mapping guidelines
The definition of common international guidelines for the compilation of high quality hydrogeological maps has been attempted from the second half of the last century for hydrogeologists, to solve the lack of uniformity among national guidelines due to the various geological-hydrogeological and climatic situations of different countries worldwide. With this aim, the China Geological Survey and the Geological Survey of Italy-ISPRA are undertaking cooperative research in implementing 1:50,000 scale hydrogeological survey and mapping at selected sites in both countries. The project intends to develop a new generation of hydrogeological and groundwater resource maps with descriptive effectiveness and consistency with field survey data. The project will promote improvements of technologies in hydrogeological survey and mapping of the two countries and might even be agreed at a wider international level. Chinese and Italian hydrogeological guidelines have similar aspects as well as concerns: 1) the undertaking of field surveys at the 1:50,000 scale and more detailed (1:25000) scale; 2) building of a hydrogeological database; 3) publication of the official map in both paper and electronic form; 4) inclusion of several small scale maps inlayed at the margin of a main map in the hydrogeological map layout; 5) comparable level in required survey quota. Furthermore, more attention will be paid to a 3D map, conceptual model, aquifer structure, groundwater cycle and hydrogeological parameter description.In contrast, the most important difference regards the following. The hydrogeological mapping guidelines of Italy have integrated specifications for both survey and mapping, i.e. they deal with a structural layout characterized by survey contents followed by mapping contents and reflect a technical route of surveying for mapping. In contrast, there are no mapping contents in the current hydrogeological guidelines of China and these then needed to be formulated. The Italian guidelines could provide important references for China in legend organization, mapping rules, survey quota and so on.Finally, the collaboration between China and Italy is of great significance for the two ancient civilized countries sharing the “One Belt and One Road” international initiative. </p
Exploring Post-Training Quantization of Protein Language Models
Recent advancements in unsupervised protein language models (ProteinLMs),
like ESM-1b and ESM-2, have shown promise in different protein prediction
tasks. However, these models face challenges due to their high computational
demands, significant memory needs, and latency, restricting their usage on
devices with limited resources. To tackle this, we explore post-training
quantization (PTQ) for ProteinLMs, focusing on ESMFold, a simplified version of
AlphaFold based on ESM-2 ProteinLM. Our study is the first attempt to quantize
all weights and activations of ProteinLMs. We observed that the typical uniform
quantization method performs poorly on ESMFold, causing a significant drop in
TM-Score when using 8-bit quantization. We conducted extensive quantization
experiments, uncovering unique challenges associated with ESMFold, particularly
highly asymmetric activation ranges before Layer Normalization, making
representation difficult using low-bit fixed-point formats. To address these
challenges, we propose a new PTQ method for ProteinLMs, utilizing piecewise
linear quantization for asymmetric activation values to ensure accurate
approximation. We demonstrated the effectiveness of our method in protein
structure prediction tasks, demonstrating that ESMFold can be accurately
quantized to low-bit widths without compromising accuracy. Additionally, we
applied our method to the contact prediction task, showcasing its versatility.
In summary, our study introduces an innovative PTQ method for ProteinLMs,
addressing specific quantization challenges and potentially leading to the
development of more efficient ProteinLMs with significant implications for
various protein-related applications.Comment: 8 pages, 4 figure
The Greenhouse Gas Emission from Portland Cement Concrete Pavement Construction in China.
This study proposes an inventory analysis method to evaluate the greenhouse gas (GHG) emissions from Portland cement concrete pavement construction, based on a case project in the west of China. The concrete pavement construction process was divided into three phases, namely raw material production, concrete manufacture and pavement onsite construction. The GHG emissions of the three phases are analyzed by a life cycle inventory method. The COâ‚‚e is used to indicate the GHG emissions. The results show that for 1 km Portland cement concrete pavement construction, the total COâ‚‚e is 8215.31 tons. Based on the evaluation results, the COâ‚‚e of the raw material production phase is 7617.27 tons, accounting for 92.7% of the total GHG emissions; the COâ‚‚e of the concrete manufacture phase is 598,033.10 kg, accounting for 7.2% of the total GHG emissions. Lastly, the COâ‚‚e of the pavement onsite construction phase is 8396.59 kg, accounting for only 0.1% of the total GHG emissions. The main greenhouse gas is COâ‚‚ in each phase, which accounts for more than 98% of total emissions. Nâ‚‚O and CHâ‚„ emissions are relatively insignificant
Cancer Nanotechnology: Enhancing Tumor Cell Response to Chemotherapy for Hepatocellular Carcinoma Therapy
Abstract Hepatocellular carcinoma (HCC) is one of the deadliest cancers due to its complexities, reoccurrence after surgical resection, metastasis and heterogeneity. In addition to sorafenib and lenvatinib for the treatment of HCC approved by FDA, various strategies including transarterial chemoembolization, radiotherapy, locoregional therapy and chemotherapy have been investigated in clinics. Recently, cancer nanotechnology has got great attention for the treatment of various cancers including HCC. Both passive and active targetings are progressing at a steady rate. Herein, we describe the lessons learned from pathogenesis of HCC and the understanding of targeted and non-targeted nanoparticles used for the delivery of small molecules, monoclonal antibodies, miRNAs and peptides. Exploring current efficacy is to enhance tumor cell response of chemotherapy. It highlights the opportunities and challenges faced by nanotechnologies in contemporary hepatocellular carcinoma therapy, where personalized medicine is increasingly becoming the mainstay. Overall objective of this review is to enhance our understanding in the design and development of nanotechnology for treatment of HCC
Investigation of genetic diversity and population structure of common wheat cultivars in northern China using DArT markers
<p>Abstract</p> <p>Background</p> <p>In order to help establish heterotic groups of Chinese northern wheat cultivars (lines), Diversity arrays technology (DArT) markers were used to investigate the genetic diversity and population structure of Chinese common wheat (<it>Triticum aestivum </it>L.).</p> <p>Results</p> <p>In total, 1637 of 7000 DArT markers were polymorphic and scored with high confidence among a collection of 111 lines composed mostly of cultivars and breeding lines from northern China. The polymorphism information content (PIC) of DArT markers ranged from 0.03 to 0.50, with an average of 0.40, with P > 80 (reliable markers). With principal-coordinates analysis (PCoA) of DArT data either from the whole genome or from the B-genome alone, all lines fell into one of two major groups reflecting 1RS/1BL type (1RS/1BL and non-1RS/1BL). Evidence of geographic clustering of genotypes was also observed using DArT markers from the A genome. Cluster analysis based on the unweighted pair-group method with algorithmic mean suggested the existence of two subgroups within the non-1RS/1BL group and four subgroups within the 1RS/1BL group. Furthermore, analysis of molecular variance (AMOVA) revealed highly significant (<it>P </it>< 0.001) genetic variance within and among subgroups and among groups.</p> <p>Conclusion</p> <p>These results provide valuable information for selecting crossing parents and establishing heterotic groups in the Chinese wheat-breeding program.</p
- …